Posts tagged ""python""

8 post(s)

Extract Tables from PDFs: 5 Methods That Actually Work

A hands-on comparison of five ways to extract tables from PDFs in Python: pdfplumber, Camelot, Tabula, AWS Textract, and manual regex. With code, benchmarks, and honest pros and cons.

By LightningPDF Team Apr 1, 2026 5 min read

"pdf""python""tables""extraction""data"

Build a Document Pipeline in Python: From Database to PDF

A complete tutorial for building a Python document pipeline that queries a database, formats data with Jinja2, generates PDFs via API, and delivers them via email or S3.

By LightningPDF Team Apr 1, 2026 3 min read

"python""tutorial""automation""pipeline""database"

PDF to JSON: How to Extract Structured Data from PDFs

Three practical approaches to extracting structured data from PDFs into JSON: regex on raw text, template-based extraction, and AI-powered extraction with code for each.

By LightningPDF Team Apr 1, 2026 4 min read

"pdf""json""python""extraction""api"

OCR PDF API: When You Need It and When You Don't

A practical guide to PDF OCR: how to check if a PDF actually needs OCR, Tesseract vs cloud APIs, and when you should skip OCR entirely by generating PDFs with real text layers.

By LightningPDF Team Apr 1, 2026 5 min read

"pdf""ocr""api""python""tesseract"

How to Parse PDFs for RAG Pipelines

A practical guide to parsing PDFs for retrieval-augmented generation. Covers chunking strategies, PyMuPDF vs Marker vs LlamaParse, and code for extracting and embedding PDF content.

By LightningPDF Team Apr 1, 2026 5 min read

"pdf""rag""llm""python""ai"

Automate Invoice Processing: From Raw Data to Branded PDF

Build an automated invoice processing pipeline that turns raw transaction data into branded PDF invoices. Complete working example with HTML template and API integration.

By LightningPDF Team Apr 1, 2026 4 min read

"invoicing""automation""api""python""tutorial"

Kreuzberg vs PyMuPDF vs pdfplumber: Which PDF Parser Should You Use?

A head-to-head comparison of Kreuzberg, PyMuPDF, and pdfplumber for Python PDF parsing. Benchmarks, architecture differences, and code examples to help you pick the right tool.

By LightningPDF Team Apr 1, 2026 6 min read

"python""pdf""extraction""comparison""kreuzberg""pymupdf""pdfplumber"

How to Extract Text from PDFs in Python (Without Losing Your Mind)

A practical guide to extracting text from PDFs in Python. Covers PyMuPDF, pdfplumber, and when you should skip extraction entirely and just generate a new PDF.

By LightningPDF Team Mar 31, 2026 5 min read

"python""pdf""extraction""tutorial"